Process tracing
The big picture: Intuition
Simple insight: If you believe this model, then seeing \(M\) should tell you something about a query regarding the \(I\), \(D\) relationship in a case.
For instance, we might have the intuition: If there was no mobilization in a high-inequality case that democratized, then inequality didn’t cause the transition.
But how do we formalize this strategy?
Key formal insight:
If you believe this model, then seeing \(M\) should tell you something about the \(\theta\)s—which are what define the effect of \(I\) on \(D\).
We are focusing on a case
We start with two things:
What we don’t know is whether any of those conditions caused the outcome
So we collect more information “from other parts of the DAG” to help figure that out
Let’s walk though the intuition
Suppose we observe democratization as the outcome in Malawi: \(D=1\)
We also observe high inequality in Malawi: \(I=1\)
We want to know: did \(I=1\) cause \(D=1\) in Malawi?
Suppose we go to the field and we learn that mass mobilization DID occur in Malawi
What can we conclude?
NOTHING YET!
We knew \(I=1\), \(D=1\)
We then saw \(M=1\)
But which is more consistent with \(I=1\) causing \(D=1\)?
In particular: beliefs about which mechanism is most likely operating
If we have already seen \(I=1, D=1\)
Linked positive effects requires us to then see: \(M=1\)
Linked negative effects requires us to then see: \(M=0\)
So which \(M\) value is more consistent with a positive \(I \rightarrow D\) effect depends on which of these is more common in the world:
We need to draw on our theoretical beliefs
Suppose we believe that:
Suppose we believe that:
Then the probability of linked positive effects is simply: \(0.3 \times 0.6 = 0.18\)
Probability of linked negative effects is simply: \(0.1 \times 0.1 = 0.01\)
So we believe linked positive effects are a MUCH more common way of generating a positive \(I \rightarrow D\) effect than linked negative effects
We believe linked positive effects are a MUCH more common way of generating a positive \(I \rightarrow D\) effect than linked negative effects
Given \(I=1\) and \(D=1\), there can only be linked positive effects IF mobilization occurs (\(M=1\))
So if we observe \(M=1\), we will think it’s more likely that high inequality DID cause democratization
If we observe \(M=0\), we think it’s less likely that high inequality caused democratization
Alternatively…
Suppose we believe that:
Suppose we believe that:
Probability of linked positive effects is: \(0.2 \times 0.1 = 0.02\)
Probability of linked negative effects is: \(0.2 \times 0.6 = 0.12\)
Now linked negative effects are a MUCH more common way of generating a positive \(I \rightarrow D\) effect than linked positive effects
Under these theoretical beliefs, \(M=0\) would be more consistent than \(M=1\) with a positive \(I \rightarrow D\) effect
To recap:
We want to know if \(I=1\) caused \(D=1\) in Malawi
Given our DAG, there are two ways to generate a positive effects of \(I\) on \(D\)
When we see \(M=1\) in Malawi, what we should conclude about \(I\)’s effect on \(D\) depends on how we think the world works
This defines what ways of generating a positive \(I \rightarrow D\) are most/least common
Which tells us which value of \(M\) is most consistent with such an effect
The evidence in process tracing never speaks for itself
Our inferences from PT evidence always depend on theory
Most process tracing is either silent about these beliefs or expresses them informally
We can formalize these beliefs
Prior beliefs we use in the book (roughly):
We process trace and observe that mobilization occurred: \(M=1\)
Given our priors, this observation increases our confidence that \(I=1\) caused \(D=1\)
Process tracing for several cases
Figure 1: Logic of simple updating on arbitrary queries.
We can tell when some evidence might potentially matter
We can say much more… actually making (model dependent) inferences
Key advantages of using a causal model:
Our DAG is:
\(X \rightarrow M \rightarrow Y\)
And we believe:
What are the types? How likely is each one? How likely is each given the data?
| Type | X | M | Y | prob | Query? | Data ? |
|---|---|---|---|---|---|---|
| X = 0, X causes M, M causes Y | 0 | 0 | 0 | 1/8 | ✓ | ✓ |
| X = 0, X causes M, M does not cause Y | 0 | 0 | 0 | 1/8 | ✓ | |
| X = 0, X does not cause M, M causes Y | 0 | 0 | 0 | 1/8 | ✓ | |
| X = 0, X does not cause M, M does not cause Y | 0 | 0 | 0 | 1/8 | ✓ | |
| X = 1, X causes M, M causes Y | 1 | 1 | 1 | 1/8 | ||
| X = 1, X causes M, M does not cause Y | 1 | 1 | 0 | 1/8 | ||
| X = 1, X does not cause M, M causes Y | 1 | 0 | 0 | 1/8 | ||
| X = 1, X does not cause M, M does not cause Y | 1 | 0 | 0 | 1/8 |
Define a model
Get types consistent with query
Get mapping from causal types to consistent data types
Get prior probabilities of each causal type
model |>
grab(what = "ambiguities_matrix") |>
data.frame() |>
mutate(
in_query = get_query_types(model, query)$types,
priors = CausalQueries:::get_type_prob(model)) |>
kable()| X0M0Y0 | X1M0Y0 | X1M1Y0 | X1M1Y1 | in_query | priors | |
|---|---|---|---|---|---|---|
| X0M00Y00 | 1 | 0 | 0 | 0 | FALSE | 0.125 |
| X1M00Y00 | 0 | 1 | 0 | 0 | FALSE | 0.125 |
| X0M01Y00 | 1 | 0 | 0 | 0 | FALSE | 0.125 |
| X1M01Y00 | 0 | 0 | 1 | 0 | FALSE | 0.125 |
| X0M00Y01 | 1 | 0 | 0 | 0 | FALSE | 0.125 |
| X1M00Y01 | 0 | 1 | 0 | 0 | FALSE | 0.125 |
| X0M01Y01 | 1 | 0 | 0 | 0 | TRUE | 0.125 |
| X1M01Y01 | 0 | 0 | 0 | 1 | TRUE | 0.125 |
Causal queries generated by query_model (all at population level)
|label |using | mean|
|:-------------------------------------------|:----------|----:|
|Y[X=1] > Y[X=0] given X==0 & M ==0 & Y == 0 |parameters | 0.25|
Also try our shiny app